Search CORE

145 research outputs found

Can electoral popularity be predicted using socially generated big data?

Author: Bright Jonathan
Yasseri Taha
Publication venue
Publication date: 01/01/2014
Field of study

Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries.Comment: To appear in Information Technolog

arXiv.org e-Print Archive

Oxford University Research Archive

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Author: Samoilenko Anna
Yasseri Taha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2013
Field of study

Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and Datasets e-mail the corresponding autho

arXiv.org e-Print Archive

Springer - Publisher Connector

Mining public opinion: why unsuccessful online petitions should not be ignored

Author: Yasseri Taha
Publication venue: London School of Economics and Political Science
Publication date: 04/08/2020
Field of study

Taha Yasseri argues that by analysing online petition data using computational techniques, politicians can glean fresh insights about the geographic factors influencing constituents’ concerns, the dynamics at play over time, as well as a deeper awareness of the issues most important to the general public

LSE Research Online

Modeling the Rise in Internet-based Petitions

Author: Hale Scott A.
Margetts Helen
Yasseri Taha
Publication venue
Publication date: 14/08/2014
Field of study

Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government over 18 months, analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate. We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1% after 10 hours). After 24 hours, a petition's fate is virtually set. The findings seem to challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve.Comment: Submitted to EPJ Data Scienc

arXiv.org e-Print Archive

Topic Modelling of Everyday Sexism Project Entries

Author: Eccles Kathryn
Melville Sophie
Yasseri Taha
Publication venue
Publication date: 05/04/2018
Field of study

The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.Comment: preprint, under revie

arXiv.org e-Print Archive

Oxford University Research Archive

Female scholars need to achieve more for equal public recognition

Author: Holstege Floris
Schellekens Menno H.
Yasseri Taha
Publication venue
Publication date: 01/01/2019
Field of study

Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online encyclopedia that its open nature allows us to have a proxy of community recognition. We calculate the probability of appearing on Wikipedia as a scientist for both male and female scholars in three different fields. We find that women in Physics, Economics and Philosophy are considerable less likely than men to be recognised on Wikipedia across all levels of achievement.Comment: Under revie

arXiv.org e-Print Archive

Oxford University Research Archive

Wikipedia traffic data and electoral prediction: towards theoretically informed models

Author: Bright Jonathan
Yasseri Taha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might seek information online at election time, and how this activity might relate to overall electoral outcomes, focussing especially on how different types of parties such as new and established parties might generate different information seeking patterns. We test this model on a novel dataset drawn from a variety of countries in the 2009 and 2014 European Parliament elections. We show that while Wikipedia offers little insight into absolute vote outcomes, it offers a good information about changes in both overall turnout at elections and in vote share for particular parties. These results are used to enhance existing theories about the drivers of aggregate patterns in online information seeking.Comment: submitted to EPJ Data Science. Additional File 1 available at https://drive.google.com/open?id=0BxaGC-YCTO6SWkJhRXlrMVRYVl

arXiv.org e-Print Archive

Springer - Publisher Connector

Oxford University Research Archive

Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: The Case of Zooniverse

Author: Khairunnisa Ibrahim
Khairunnisa Ibrahim
Samuel Khodursky
Samuel Khodursky
Taha Yasseri
Taha Yasseri
Taha Yasseri
Taha Yasseri
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2021
Field of study

Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest online citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high “burstiness” of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among online citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Online citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes

Directory of Open Access Journals

Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods

Author: Eynon Rebecca
Gillani Nabeel
Hjorth Isis
Yasseri Taha
Publication venue
Publication date: 01/01/2016
Field of study

Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times naming 2012 "The Year of the MOOC." For those engaged in learning analytics and educational data mining, MOOCs have provided an exciting opportunity to develop innovative methodologies that harness big data in education.Comment: Preprint of a chapter to appear in "Data Mining and Learning Analytics: Applications in Educational Research

arXiv.org e-Print Archive

Oxford University Research Archive

Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis

Author: Blex Chris
Dinh Rachel
Gildersleve Patrick
Yasseri Taha
Publication venue
Publication date: 28/06/2020
Field of study

Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dating site eHarmony over the past decade to identify how attitudes and behaviors have changed over this time period. While other studies have investigated disparities in user behavior between male and female users, this study is unique in its longitudinal approach. Specifically, we analyze how men and women differ in their preferences for certain traits in potential partners and how those preferences have changed over time. The second line of inquiry investigates to what extent physical attractiveness determines the rate of messages a user receives, and how this relationship varies between men and women. Thirdly, we explore whether online dating practices between males and females have become more equal over time or if biases and inequalities have remained constant (or increased). Fourthly, we study the behavioural traits in sending and replying to messages based on one's own experience of receiving messages and being replied to. Finally, we found that similarity between profiles is not a predictor for success except for the number of children and smoking habits. This work could have broader implications for shifting gender norms and social attitudes, reflected in online courtship rituals. Apart from the data-based research, we connect the results to existing theories that concern the role of ICTs in societal change. As searching for love online becomes increasingly common across generations and geographies, these findings may shed light on how people can build relationships through the Internet.Comment: Preprint, under revie

arXiv.org e-Print Archive